In today’s practical we’re going to learn how to organize data in the tidy way.

Messy and Tidy data

The best way to understand the structure of tidy data is for us to collect some data, and then enter it in a tidy way.

Our tasks for today will involve:

  1. Creating some data (to do this, we’ll run a quick experiment)

  2. Enter that data into a spreadsheet (we’ll use google docs)

  3. Download the spreadsheet as a .csv file

  4. FInally, read that data into R Studio

Creating some data

To create some data, we’re just going to run a quick colour Stroop task. In the colour Stroop task you have to identify the colour of word. The words can be printed in Red, Blue, Yellow, or Green.

The words that are used in the task are colour names (e.g., Red, Blue, Green, and Yellow).

The words are presented under two conditions:

  1. In the congruent condition the colour of the word matches the name (e.g., Red, Blue)

  2. In the incongruent condition the colour of the word does not match the name (e.g., Blue, Red)

At the end of the task you’ll be presented with two numbers. One will be how quickly you could identify the colour for in the congruent condition, and the other will be how quickly you could identify the colour in the incongruent condition.


Write down these two numbers. Make sure you note which number is for which condition!

The next thing we need is a identifier for each of you. Participants identifier are necessary because we need to know which two numbers belong to which person.

The participant identifier should be unique so two participants don’t have the same one. However, you also shouldn’t use something like the participant’s name, because the participant’s data should be anonymised. To generate an anonymous participant ID, enter your name and a random number into the box below. This will generate your participant ID. You should also write this down.



Next, we’ll split the class into two groups as if we were doing a between groups study. Click the box below to get your group ID. You should also write this down.



We all should now have four bits of information. A reaction time for the congruent condition on the Stroop task, a reaction for the incongruent condition of the Stroop task, your participant ID, and your group ID.

Entering the data in a spreadsheet

Now we’ll enter this data into a spreadsheet. We’ll enter the data in the tidy way. This means that each of the two measurements will go on their own row.

Follow

When you’re entering the data, remember that identifiers have to be specified exactly

  1. The two identifiers for the two conditions are con (for the congruent condition) and inc (for the incongruent condition). Be careful to watch the spelling.

  2. For your id code, make sure you enter it exactly as it’s shown on the sheet above.

  3. The same applies for the group identifier

  4. And for the reaction time (rt)

Once the class has finished entering the data into the spreadsheet, we’ll download the spreadsheet onto our computers. To download the spreadsheet go to:

File > Download > Comma-separated values (.csv, current sheet)

Save the file into the data sub folder of your project folder and give it the name stroop.csv

Reading the data into R

Finally, we’ll read the data into R. Doing this will involve two steps.

  1. We’ll use the here::here() function to translate our directions from here into a file path (just as we did last week with the image file)

  2. We’ll use the readr::read_csv() function to actually read in the file.

Task 1

Again, we’ll start off by creating a R Project. You can call it whatever you want, but prac_08 is probably a good name.

Task 2

Create a new R Markdown (.Rmd) file and create out everything expect the header and the setup code chunk.

Task 3

Make sure you load the here, the readr, and the tibble package in the setup code chunk.

The library() function is used for loading packages

Task 4

Create a new code chuck. In this code chunk do the following things:

  1. Use here to give the directions to the data file

  2. Use the directions generated by here as the input to the read_csv function from the readr package. This will read in the data.

  3. Make sure you assign the output of the read_csv function to an object!

This is almost exactly the same as what we did last week, expect we’re using read_csv to read a data file instead of include_graphics to read an image file. If you’re stuck, check back to last week’s practical.

---
title: "Untitled"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
# load the packages
library("here")
library("readr")
library("tibble")
```
 
```{r}
# read in the data
data_file <- here::here("data/stroop.csv")
stroop_tib <- readr::read_csv(file = data_file)
```

Task 5

Now that we’ve read our data into an object, add a new code chunk which simply calls that object, so that the content will be printed out below the code chunk.

Task 6

Now let’s try knit our R Markdown document, to make sure everything has worked correctly.


Task 7

For the next problem, we’re also going to try use the dplyr::select() function.

Add the code chunk below into your RMarkdown document

```{r}
penguin_data <- palmerpenguins::penguins
penguin_data
```

Once you’ve copied this code chunk into this document then make sure you run it, because it will load some data.

Task 8

Now that you have the data loaded, use dplyr::select() to select the island, and species column. You end result should look like the following:

## # A tibble: 344 × 2
##    island    species
##    <fct>     <fct>  
##  1 Torgersen Adelie 
##  2 Torgersen Adelie 
##  3 Torgersen Adelie 
##  4 Torgersen Adelie 
##  5 Torgersen Adelie 
##  6 Torgersen Adelie 
##  7 Torgersen Adelie 
##  8 Torgersen Adelie 
##  9 Torgersen Adelie 
## 10 Torgersen Adelie 
## # … with 334 more rows
```{r}
dplyr::select(.data = penguin_data, island, species)
```

Task 9

Now that you have a sense of how the dplyr::select() function works, let’s try a problem where you don’t have an output to compare it to.

For this problem, we’ll be using the stroop data tibble that we created earlier. This tibble should have a column for id, condition, rt and group. Since people were just put into two groups which didn’t really mean anything, let’s drop the group column from this tibble.

```{r}
dplyr::select(.data = stroop_data, -group)
```

Task 10

We’re not going to cover how to make pretty formatted tables this year. But for this task we’ll just do something very basic that won’t require much explanation.

To make a tibble into a nicely formatted table, we can use a function called kable() from the knitr package. This function takes one input x, which will just be a tibble.

Add a new code chunk where the tibble we created earlier is used at the input to the knitr::kable() function. Also, make sure you load the knitr package in the setup code chunk.

Once you’ve down this, knit the document again and see what the nicely formatted table looks like.

```{r}
knitr::kable(x = stroop_data)